On the Road to High-Quality POS-Tagging
Identifieur interne : 000971 ( Main/Exploration ); précédent : 000970; suivant : 000972On the Road to High-Quality POS-Tagging
Auteurs : Stefan Klatt [Autriche] ; Karel Oliva [Autriche]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
English descriptors
- Teeft :
- Analysis systems, Annotation, Better precision rate, Better results, Central bank, Computational linguistics, Corpus, Corpus frequency, Correct assignment, Correct reading, Correct readings, Corrua readings, Current tokenizers, Decision window, Entire article, Entity recognizer, Error rate, Error rates, External sector, Foreign material, Foreign words, Inflationary expectations, Input problems, Klatt, Length tokens, Lexical, Lexical analysis, Linguistic point, Linguistic rule, Linguistic rules, Linguistic tagger, Linguistic taggers, Many errors, Module, Module uwrb, Modules uwcb, Modules uwrb, More readings, Morphological analysis, Multiword units, Negra, Negra corpus, Next section, Nite verb, Noun, Oliva, Original sense, Other words, Output problems, Parameterizable threshold, Pipeline architecture, Precision rate, Processing architecture, Proper nouns, Quotation marks, Relative pronoun, Second case, Single quotation mark, Statistical tagger, Statistical taggers, Such tokens, Tagger, Test corpus, Text material, Tokenization, Training material, Ungrammatical readings, Unknown tokens, Unknown word, Unknown words, Uwrb, Verb reading, Whole sentential context, Wieder einmal, Word vergessen.
Abstract
Abstract: In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.
Url:
DOI: 10.1007/11551263_31
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000F61
- to stream Istex, to step Curation: 000E84
- to stream Istex, to step Checkpoint: 000770
- to stream Main, to step Merge: 000970
- to stream Main, to step Curation: 000971
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">On the Road to High-Quality POS-Tagging</title>
<author><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
</author>
<author><name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:96C50054BE9FF5B6C161E8EC182C63333573ACCB</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11551263_31</idno>
<idno type="url">https://api.istex.fr/document/96C50054BE9FF5B6C161E8EC182C63333573ACCB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000F61</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000F61</idno>
<idno type="wicri:Area/Istex/Curation">000E84</idno>
<idno type="wicri:Area/Istex/Checkpoint">000770</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000770</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Klatt S:on:the:road</idno>
<idno type="wicri:Area/Main/Merge">000970</idno>
<idno type="wicri:Area/Main/Curation">000971</idno>
<idno type="wicri:Area/Main/Exploration">000971</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">On the Road to High-Quality POS-Tagging</title>
<author><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
<affiliation wicri:level="3"><country xml:lang="fr">Autriche</country>
<wicri:regionArea>Austrian Research Institute for Artificial Intelligence, Freyung 6/6, A-1010, Vienna</wicri:regionArea>
<placeName><settlement type="city">Vienne (Autriche)</settlement>
<region nuts="2" type="province">Vienne (Autriche)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Autriche</country>
</affiliation>
</author>
<author><name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
<affiliation wicri:level="3"><country xml:lang="fr">Autriche</country>
<wicri:regionArea>Austrian Research Institute for Artificial Intelligence, Freyung 6/6, A-1010, Vienna</wicri:regionArea>
<placeName><settlement type="city">Vienne (Autriche)</settlement>
<region nuts="2" type="province">Vienne (Autriche)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Autriche</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Analysis systems</term>
<term>Annotation</term>
<term>Better precision rate</term>
<term>Better results</term>
<term>Central bank</term>
<term>Computational linguistics</term>
<term>Corpus</term>
<term>Corpus frequency</term>
<term>Correct assignment</term>
<term>Correct reading</term>
<term>Correct readings</term>
<term>Corrua readings</term>
<term>Current tokenizers</term>
<term>Decision window</term>
<term>Entire article</term>
<term>Entity recognizer</term>
<term>Error rate</term>
<term>Error rates</term>
<term>External sector</term>
<term>Foreign material</term>
<term>Foreign words</term>
<term>Inflationary expectations</term>
<term>Input problems</term>
<term>Klatt</term>
<term>Length tokens</term>
<term>Lexical</term>
<term>Lexical analysis</term>
<term>Linguistic point</term>
<term>Linguistic rule</term>
<term>Linguistic rules</term>
<term>Linguistic tagger</term>
<term>Linguistic taggers</term>
<term>Many errors</term>
<term>Module</term>
<term>Module uwrb</term>
<term>Modules uwcb</term>
<term>Modules uwrb</term>
<term>More readings</term>
<term>Morphological analysis</term>
<term>Multiword units</term>
<term>Negra</term>
<term>Negra corpus</term>
<term>Next section</term>
<term>Nite verb</term>
<term>Noun</term>
<term>Oliva</term>
<term>Original sense</term>
<term>Other words</term>
<term>Output problems</term>
<term>Parameterizable threshold</term>
<term>Pipeline architecture</term>
<term>Precision rate</term>
<term>Processing architecture</term>
<term>Proper nouns</term>
<term>Quotation marks</term>
<term>Relative pronoun</term>
<term>Second case</term>
<term>Single quotation mark</term>
<term>Statistical tagger</term>
<term>Statistical taggers</term>
<term>Such tokens</term>
<term>Tagger</term>
<term>Test corpus</term>
<term>Text material</term>
<term>Tokenization</term>
<term>Training material</term>
<term>Ungrammatical readings</term>
<term>Unknown tokens</term>
<term>Unknown word</term>
<term>Unknown words</term>
<term>Uwrb</term>
<term>Verb reading</term>
<term>Whole sentential context</term>
<term>Wieder einmal</term>
<term>Word vergessen</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.</div>
</front>
</TEI>
<affiliations><list><country><li>Autriche</li>
</country>
<region><li>Vienne (Autriche)</li>
</region>
<settlement><li>Vienne (Autriche)</li>
</settlement>
</list>
<tree><country name="Autriche"><region name="Vienne (Autriche)"><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
</region>
<name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
<name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
<name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000971 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000971 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:96C50054BE9FF5B6C161E8EC182C63333573ACCB |texte= On the Road to High-Quality POS-Tagging }}
This area was generated with Dilib version V0.6.33. |